Deep learning, stochastic gradient descent and diffusion maps

نویسندگان

چکیده

Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency, but a complete understanding of why SGD performs so well remains major challenge. It has been observed empirically that most eigenvalues the Hessian loss functions on landscape over-parametrized neural networks are close zero, while only small number large. Zero indicate zero diffusion along corresponding directions. This indicates process minima selection mainly happens relatively low-dimensional subspace top Hessian. Although parameter space very high-dimensional, these findings seems dynamics may live manifold. In this paper, we pursue truly data driven approach problem getting potentially deeper high-dimensional surface, and particular, traced out by analyzing generated through SGD, or any other optimizer for matter, order possibly discover (local) representations optimization landscape. As our vehicle exploration, use maps introduced R. Coifman coauthors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

We design and implement a distributed multinode synchronous SGD algorithm, without altering hyperparameters, or compressing data, or altering algorithmic behavior. We perform a detailed analysis of scaling, and identify optimal design points for different networks. We demonstrate scaling of CNNs on 100s of nodes, and present what we believe to be record training throughputs. A 512 minibatch VGG...

متن کامل

Online Learning, Stability, and Stochastic Gradient Descent

In batch learning, stability together with existence and uniqueness of the solution corresponds to well-posedness of Empirical Risk Minimization (ERM) methods; recently, it was proved that CVloo stability is necessary and sufficient for generalization and consistency of ERM ([9]). In this note, we introduce CVon stability, which plays a similar role in online learning. We show that stochastic g...

متن کامل

Annealed Gradient Descent for Deep Learning

Stochastic gradient descent (SGD) has been regarded as a successful optimization algorithm in machine learning. In this paper, we propose a novel annealed gradient descent (AGD) method for non-convex optimization in deep learning. AGD optimizes a sequence of gradually improved smoother mosaic functions that approximate the original non-convex objective function according to an annealing schedul...

متن کامل

Learning Rate Adaptation in Stochastic Gradient Descent

The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...

متن کامل

Hilbert maps: scalable continuous occupancy mapping with stochastic gradient descent

The vast amount of data robots can capture today motivates the development of fast and scalable statistical tools to model the environment the robot operates in. We devise a new technique for environment representation through continuous occupancy mapping that improves on the popular occupancy grip maps in two fundamental aspects: 1) it does not assume an a priori discretisation of the world in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of computational mathematics and data science

سال: 2022

ISSN: ['2772-4158']

DOI: https://doi.org/10.1016/j.jcmds.2022.100054